The goal of this file is to arrange gene-centric Caenorhbaditis RNA-seq data originally published as part of the modENCODE project or by the Waterston Lab at the University of Washington (Gerstein et al 2010, Gerstein et al 2014, Boeck et al., 2016, and Warner et al 2019).
Each genome and gff file were downloaded from WormBase.org version WS290. Reads were aligned to each genome using STAR (v2.7.6a, –alignIntronMax 30000 –alignMatesGapMax 30000) and the species-specific WS290 GTF file for each genome. PCR duplicates were removed using “seldup” (Warner et al., 2019). Read counts were obtained for each gene (CDS region only which is labeled as “CDS” in C. briggsae and C. elegans and as “coding_exon” for C. remanei, C. japonica, and C. brenneri) using featureCounts (from the software package subread-2.0.6) using default settings. Only uniquely mapping reads were counted. Additionally read counts were obtained for the CDS regions and for the full transcripts using the featureCount options -M –fraction so that multimappers were counted by splitting them equally among all of the locations where they aligned. Alignment and counting was performed by LaDeana Hillier (Waterston Lab, UW).
Read data for each species was imported into R and annotated with information from WormBase ParaSite BiomaRT.
Raw reads were quantified as counts per million using the EdgeR package, then filtered to remove transcripts with low counts (less than 1 count-per-million across either 2 or 1 samples). A list of discarded genes and their expression values across life stages was saved. Non-discarded gene values were normalized using the trimmed mean of M-values method (TMM, Robinson and Oshlack ) to permit between-samples comparisons.
For C. briggsae and C. elegans, samples contain
significant numbers of biological replicates. Thus these samples were
additionally pre-processed for differential gene expression analyses.
The mean-variance relationship was modeled using a precision weights
approach (Law
et al 2014), using the voom function in the
limma package. This function produced a variance-stabilized
DGEList object that includes precision weights for each gene to control
for heteroscedasity, as well as normalized expression values on the log2
scale (log2 counts-per-million).
For C. remanei, C. japonica, and C.
brenneri, the lack of consistent biological replicates precludes
limma::voom processing and downstream differential
expression calculations. To enable app users to visualize gene count
data, a DGEList object was manually constructed that contained filtered,
normalized log2 CPM values, gene annotations information, and sample
information. This file can be loaded into the RNA-seq Browser for
plotting.
For C. elegans embryonic timeline data, samples are from four previously published time series. Biological replicates are defined by a previously established inferred timeline that unified samples between these embryo time series (Boeck et al., 2016). Group names represent the average time for biological replicate pairs, which were selected to represent 80 minute developmental intervals. This timing was chosen to match previously published differential expression analyses.
This document saves multiple files that are passed to a Shiny Web App for downstream browsing and on-demand analysis. Note that these files are saved in an Outputs folder; in order to make them accessible to a local version of the Shiny browser they need to be moved to appropriate subfolders within the App folder - the www sub folder (for .csv files) or the Data subfolder (for R objects). Stable copies are already located within those folders and do not need to be replaced unless the pre-processing steps change.
Note: Code chunks are collated and echoed at the end of the document in Appendix I.
Generate a digital gene expression list that could be easily shared/loaded for downstream filtering/normalization.
species_list <- tibble(species = c('elegans', 'briggsae', 'brenneri', 'remanei', 'japonica'))
for (x in species_list$species) {
# read in the study design ----
targets <- read_tsv(paste0("./Data/", x, "/", x,"_study_design.txt"),
na = c("", "NA", "na"), show_col_types = F)
# load pre-generated annotation information
load(paste0("./Outputs/",x,"_geneAnnotations"))
# import featureCount output into R ----
if (x == 'elegans' | x == 'briggsae') {
path <- paste0("./Data/", x, "/featureCount.C_", x,".", targets$Biological_ID, ".CDS.unique_only.ws290.txt")
} else {
path <- paste0("./Data/", x, "/featureCount.C_", x,".", targets$Biological_ID, ".coding_exon.unique_only.ws290.txt")
}
featureCountData<- rbindlist(lapply(path, fread), idcol="sample") %>%
mutate(sample = targets$sample[sample])
colnames(featureCountData)<-c('sample','geneID', 'geneName', 'stableID', "location", "length", "count")
featureCountData_wider <- featureCountData %>%
dplyr::select(!c(location, length)) %>%
pivot_wider(names_from = sample, values_from = count)
counts <- featureCountData_wider %>%
dplyr::select(-c(stableID, geneName))%>%
column_to_rownames(var = "geneID")
annotations_sub<-dplyr::select(featureCountData_wider, geneID) %>%
left_join(annotations, by = "geneID")
# generate a DGEList
myDGEList <- DGEList(counts,
samples = targets$sample,
group = targets$group,
genes = annotations_sub)
# Save outputs
# Save myDGEList for downstream filtering
output.name <- paste0(x, '_DGEList')
save(myDGEList,
file = file.path(output.path,
output.name))
}
The goal of this chunk is to:
ggplot2 to visualize the impact of filtering and
normalization on the data.species_list <- tibble(species = c('elegans', 'briggsae', 'brenneri', 'remanei', 'japonica'))
for (x in species_list$species) {
# load pre-generated DGEList information
load(paste0("./Outputs/",x,"_DGEList"))
# load pre-generated annotation information
load(paste0("./Outputs/",x,"_geneAnnotations"))
# read in the study design ----
targets <- read_tsv(paste0("./Data/", x, "/", x,"_study_design.txt"),
na = c("", "NA", "na"), show_col_types = F)
# Generate life stage IDs
ids <- rep(cbind(targets$group),
times = nrow(myDGEList$counts)) %>%
as_factor()
# calculate and plot log2 counts per million ----
# use the 'cpm' function from EdgeR to get log2 counts per million
# then coerce into a tibble
log2.cpm.df.pivot <-cpm(myDGEList, log=TRUE) %>%
as_tibble(rownames = "geneID") %>%
setNames(nm = c("geneID", targets$sample)) %>%
pivot_longer(cols = -geneID,
names_to = "samples",
values_to = "expression") %>%
add_column(life_stage = ids)
# plot the data
p1 <- ggplot(log2.cpm.df.pivot) +
aes(x=samples, y=expression, fill=life_stage) +
geom_violin(trim = FALSE, show.legend = T, alpha= 0.7) +
stat_summary(fun = "median",
geom = "point",
shape = 20,
size = 2,
color = "black",
show.legend = FALSE) +
labs(y="log2 expression", x = "sample",
title = paste0("C. ", x, ": Log2 Counts per Million (CPM)"),
subtitle="unfiltered, non-normalized",
fill= "Life stage",
caption=paste0("produced on ", Sys.time())) +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10),
axis.text.y = element_text (size = 5)) +
coord_flip()
# Filter the data ----
# filter genes/transcripts with low counts
# how many genes had more than 1 CPM (TRUE) in at least n samples
# Note: The cutoff "n" is adjusted for the number of
# samples in the smallest group of comparison.
if (x == 'elegans' | x == 'briggsae') {
keepers <- cpm(myDGEList) %>%
rowSums(.>1)>=2
} else {
keepers <- cpm(myDGEList) %>%
rowSums(.>1)>=1
}
myDGEList.filtered <- myDGEList[keepers,]
ids.filtered <- rep(cbind(targets$group),
times = nrow(myDGEList.filtered)) %>%
as_factor()
log2.cpm.filtered.df.pivot <- cpm(myDGEList.filtered, log=TRUE) %>%
as_tibble(rownames = "geneID") %>%
setNames(nm = c("geneID", targets$sample)) %>%
pivot_longer(cols = -geneID,
names_to = "samples",
values_to = "expression") %>%
add_column(life_stage = ids.filtered)
p2 <- ggplot(log2.cpm.filtered.df.pivot) +
aes(x=samples, y=expression, fill=life_stage) +
geom_violin(trim = FALSE, show.legend = T, alpha= 0.7) +
stat_summary(fun = "median",
geom = "point",
shape = 20,
size = 2,
color = "black",
show.legend = FALSE) +
labs(y="log2 expression", x = "sample",
title = paste0("C. ", x, ": Log2 Counts per Million (CPM)"),
subtitle="filtered, non-normalized",
fill = "Life stage",
caption=paste0("produced on ", Sys.time())) +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10),
axis.text.y = element_text (size = 5)) +
coord_flip()
# Look at the genes excluded by the filtering step ----
# just to check that there aren't any with
# high expression that are in few samples
# Discarded genes
myDGEList.discarded <- myDGEList[!keepers,]
ids.discarded <- rep(cbind(targets$group),
times = nrow(myDGEList.discarded)) %>%
as_factor()
log2.cpm.discarded.df.pivot <- cpm(myDGEList.discarded, log=F) %>%
as_tibble(rownames = "geneID") %>%
setNames(nm = c("geneID", targets$sample)) %>%
pivot_longer(cols = -geneID,
names_to = "samples",
values_to = "expression") %>%
add_column(life_stage = ids.discarded)
# Genes that are above 1 cpm
log2.cpm.discarded.df.pivot %>%
dplyr::filter(expression > 1)
# Generate a matrix of discarded genes and their raw counts ----
discarded.gene.df <- log2.cpm.discarded.df.pivot %>%
pivot_wider(names_from = c(life_stage, samples),
names_sep = "-",
values_from = expression,
id_cols = geneID)%>%
left_join(annotations, by = "geneID")
# Plot discarded genes
p.discarded <- ggplot(log2.cpm.discarded.df.pivot) +
aes(x=samples, y=expression, color=life_stage) +
geom_jitter(alpha = 0.3, show.legend = T)+
stat_summary(fun = "median",
geom = "point",
shape = 20,
size = 2,
color = "black",
show.legend = FALSE) +
labs(y="expression", x = "sample",
title = paste0("C. ", x, ": Counts per Million (CPM)"),
subtitle="genes excluded by low count filtering step, non-normalized",
fill = "Life stage",
caption=paste0("produced on ", Sys.time())) +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10),
axis.text.y = element_text (size = 5)) +
coord_flip()
# Normalize the data using a between samples normalization ----
# Source for TMM sample normalization here:
# https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-3-r25
myDGEList.filtered.norm <- calcNormFactors(myDGEList.filtered, method = "TMM")
log2.cpm.filtered.norm <- cpm(myDGEList.filtered.norm, log=TRUE)
log2.cpm.filtered.norm.df<- cpm(myDGEList.filtered.norm, log=TRUE) %>%
as_tibble(rownames = "geneID") %>%
setNames(nm = c("geneID", targets$sample))
log2.cpm.filtered.norm.df.pivot<-log2.cpm.filtered.norm.df %>%
pivot_longer(cols = -geneID,
names_to = "samples",
values_to = "expression") %>%
add_column(life_stage = ids.filtered)
p3 <- ggplot(log2.cpm.filtered.norm.df.pivot) +
aes(x=samples, y=expression, fill=life_stage) +
geom_violin(trim = FALSE, show.legend = T, alpha = 0.7) +
stat_summary(fun = "median",
geom = "point",
shape = 20,
size = 2,
color = "black",
show.legend = FALSE) +
labs(y="log2 expression", x = "sample",
title = paste0("C. ", x, ": Log2 Counts per Million (CPM)"),
subtitle="filtered, TMM normalized",
fill = "Life stage",
caption=paste0("produced on ", Sys.time())) +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10),
axis.text.y = element_text (size = 5)) +
coord_flip()
# Save outputs
output.name <- paste0(x, '_FilteringNormalizationGraphs')
save(p1, p2, p3, p.discarded, discarded.gene.df,
file = file.path(output.path,
output.name))
output.name <- paste0(x, '_FilteredNormalizedData')
save(myDGEList.filtered.norm, log2.cpm.filtered.norm,
file = file.path(output.path,
output.name))
# Save a matrix of discarded genes and their raw counts ----
output.name <- paste0(x, '_discardedGenes.csv')
discarded.gene.df %>%
write.csv(file = file.path(output.path,
output.name))
}
For C. elegans and C. briggsae, the goal of this
chunk is to fit data to a linear model for responsively detecting
differentially expressed genes (DGEs) using limma::voom.
For C. remanei, C. japonica, and C. brenneri,
the goal is to generate a DGEList object shaped like the one produced by
limma::voom, but that contains filtered and normalized (but
not voom adjusted) log2CPM values as well as annotation and study design
information.
This chunk saves either the variance-stabilized, filtered and
normalized data (C. elegans and C. briggsae) or the
filtered and normalized data (C. remanei, C. japonica,
and C. brenneri) as an R object with the suffix
_DGEList. It also saves a matrix of processed count data as
a .csv file for downloading from the App.
species_list <- tibble(species = c('elegans', 'briggsae', 'brenneri', 'remanei', 'japonica'))
for (x in species_list$species) {
# load pre-generated filtered and normalized data
load(paste0("./Outputs/",x,"_FilteredNormalizedData"))
# read in the study design ----
targets <- read_tsv(paste0("./Data/", x, "/", x,"_study_design.txt"),
na = c("", "NA", "na"), show_col_types = F)
if (x == 'elegans' | x == 'briggsae'){
# Compute Variance-Stabilized DGEList Object ----
# Set up the design matrix ----
# no intercept/blocking for matrix, comparisons across group
group <- factor(targets$group)
design <- model.matrix(~0 + group)
colnames(design) <- levels(group)
# Model mean-variance trend and fit linear model to data ----
# Use VOOM function from Limma package to model the mean-variance relationship
# produces a variance-stabilized DGEList, that include precision
# weights for each gene to try and control for heteroscedasity.
# transforms count data to log2-counts per million
# Outputs: E = normalized expression values on the log2 scale
# Note: this object is being called 'DGEList.filtered.norm' instead of
# 'v.DGEList.filtered.norm' to make it easier to work with non-variance stabilized data
# from other species.
DGEList.filtered.norm <- voom(counts = myDGEList.filtered.norm,
design = design, plot = T)
colnames(DGEList.filtered.norm)<-targets$sample
colnames(DGEList.filtered.norm$E) <- paste(targets$group,
targets$uniqueness_suffix,sep = '-')
# Save matrix of genes and their filtered, normalized, voom-transformed counts in Shiny app download (www) folder ----
# This is the count data that underlies the differential expression analyses in the Shiny app.
output.name <- paste0(x, '_log2cpm_filtered_norm_voom.csv')
write.csv(DGEList.filtered.norm$E,
file = file.path(www.path,
output.name))
# Save variance-stabilized DGEList in Shiny app Data folder----
output.name <- paste0(x, '_DGEList')
save(DGEList.filtered.norm,
file = file.path(app.path,
output.name))
} else{
DGEList.filtered.norm <- new("EList", list (genes = myDGEList.filtered.norm$genes, targets = myDGEList.filtered.norm$samples, E = log2.cpm.filtered.norm))
colnames(DGEList.filtered.norm)<-targets$sample
colnames(DGEList.filtered.norm$E) <- paste(targets$group,
targets$uniqueness_suffix,sep = '-')
# Save matrix of genes and their filtered, normalized counts in Shiny app download (www) folder----
output.name <- paste0(x, '_log2cpm_filtered_norm.csv')
write.csv(DGEList.filtered.norm$E,
file = file.path(www.path,
output.name))
# Save DGEList in Shiny app Data folder----
output.name <- paste0(x, '_DGEList')
save(DGEList.filtered.norm,
file = file.path(app.path,
output.name))
}
}
## Multivariate Analysis The goal of this chunk is to implement
hierarchical clustering and PCA analyses, plotting a dendrogram of the
hierarchical clustering, a PCA plot for the first two principle
components, and a PCA small multiple plot. This data is performed on
either variance-stabilized, filtered and normalized data (C.
elegans, C. briggsae), or filtered and normalized data
(C. brenneri, C. remanei, C. japonica).
species_list <- tibble(species = c('elegans', 'briggsae', 'brenneri', 'remanei', 'japonica'))
for (x in species_list$species) {
# read in DGElist object
output.name <- paste0(x, '_DGEList')
load(file = file.path(app.path, output.name))
# read in the study design ----
targets <- read_tsv(paste0("./Data/", x, "/", x,"_study_design.txt"),
na = c("", "NA", "na"), show_col_types = F)
# Perform multivariate analysis on the data
# Identify variables of interest in study design file ----
group <- factor(targets$group)
# Hierarchical clustering ---------------
# Remember: hierarchical clustering can only work on a data matrix, not a data frame
# Calculate distance matrix
# dist calculates distance between rows, so transpose data so that we get distance between samples.
# how similar are samples from each other
distance <- dist(t(DGEList.filtered.norm$E), method = "maximum") #other distance methods are "euclidean", maximum", "manhattan", "canberra", "binary" or "minkowski"
# Calculate clusters to visualize differences. This is the hierarchical clustering.
# The methods here include: single (i.e. "friends-of-friends"), complete (i.e. complete linkage), and average (i.e. UPGMA). Here's a comparison of different types: https://en.wikipedia.org/wiki/UPGMA#Comparison_with_other_linkages
clusters <- hclust(distance, method = "complete") #other agglomeration methods are "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", or "centroid"
dend <- as.dendrogram(clusters)
p4<-dend %>%
dendextend::set("hang_leaves", c(0.1)) %>%
dendextend::set("labels_cex", c(0.5)) %>%
dendextend::set("branches_lwd", c(0.7)) %>%
as.ggdend %>%
ggplot (offset_labels = -0.5) +
theme_dendro() +
ylim(-8, max(get_branches_heights(dend))) +
labs(title = "Hierarchical Cluster Dendrogram ",
subtitle = "filtered, TMM normalized",
y = "Distance",
x = "Life stage") +
coord_fixed(1/2) +
theme(axis.title.x = element_text(color = "black"),
axis.title.y = element_text(angle = 90),
axis.text.y = element_text(angle = 0),
axis.line.y = element_line(color = "black"),
axis.ticks.y = element_line(color = "black"),
axis.ticks.length.y = unit(2, "mm"))
# Principal component analysis (PCA) -------------
# this also works on a data matrix, not a data frame
pca.res <- prcomp(t(DGEList.filtered.norm$E), scale.=F, retx=T)
summary(pca.res) # Prints variance summary for all principal components.
#pca.res$rotation #$rotation shows you how much each gene influenced each PC (called 'scores')
#pca.res$x # 'x' shows you how much each sample influenced each PC (called 'loadings')
#note that these have a magnitude and a direction (this is the basis for making a PCA plot)
## This generates a screeplot: a standard way to view eigenvalues for each PCA. Shows the proportion of variance accounted for by each PC. Plotting only the first 10 dimensions.
p5<-fviz_eig(pca.res,
#barcolor = brewer.pal(8,"Pastel2")[8],
#barfill = brewer.pal(8,"Pastel2")[8],
linecolor = "black",
main = "Scree plot: proportion of variance accounted for by each principal component", ggtheme = theme_bw())
pc.var<-pca.res$sdev^2 # sdev^2 captures these eigenvalues from the PCA result
pc.per<-round(pc.var/sum(pc.var)*100, 1) # we can then use these eigenvalues to calculate the percentage variance explained by each PC
# Visualize the PCA result ------------------
#lets first plot any two PCs against each other
#We know how much each sample contributes to each PC (loadings), so let's plot
pca.res.df <- as_tibble(pca.res$x[,1:2]) %>%
add_column(group = targets$group)
# Plotting PC1 and PC2
p6<-ggplot(pca.res.df) +
aes(x=PC1, y=PC2,
label = group,
fill = group,
color = group
) +
geom_point(size=4) +
xlab(paste0("PC1 (",pc.per[1],"%",")")) +
ylab(paste0("PC2 (",pc.per[2],"%",")")) +
labs(title="Principal Components Analysis",
sub = "Note: analysis is blind to sample identity.",
fill = "Life stage",
color = "Life stage") +
scale_x_continuous(expand = c(.3, .3)) +
scale_y_continuous(expand = c(.3, .3)) +
coord_fixed() +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10))
# Create a PCA 'small multiples' chart ----
if (ncol(pca.res$x)>5){
pca.res.df <- pca.res$x[,1:6]}
else {
pca.res.df <- pca.res$x[,1:ncol(pca.res$x)]
}
pca.res.df <- pca.res.df %>%
as_tibble() %>%
add_column(sample = targets$sample,
group = targets$group)
pca.pivot <- pivot_longer(pca.res.df, # data frame to be pivoted
cols = PC1:PC3, # column names to be stored as a SINGLE variable
names_to = "PC", # name of that new variable (column)
values_to = "loadings") # name of new variable (column) storing all the values (data)
PC1<-subset(pca.pivot, PC == "PC1")
PC2 <-subset(pca.pivot, PC == "PC2")
PC3 <- subset(pca.pivot, PC == "PC3")
#PC4 <- subset(pca.pivot, PC == "PC4")
# New facet label names for PCs
PC.labs <- c(paste0("PC1 (",pc.per[1],"%",")"),
paste0("PC2 (",pc.per[2],"%",")"),
paste0("PC3 (",pc.per[3],"%",")")
)
names(PC.labs) <- c("PC1", "PC2", "PC3")
p7<-ggplot(pca.pivot) +
aes(x=sample, y=loadings) + # you could iteratively 'paint' different co-variates onto this plot using the 'fill' aes
geom_bar(stat="identity") +
facet_wrap(~PC, labeller = labeller(PC = PC.labs)) +
geom_bar(data = PC1, stat = "identity", aes(fill = group)) +
geom_bar(data = PC2, stat = "identity", aes(fill = group)) +
geom_bar(data = PC3, stat = "identity", aes(fill = group)) +
labs(title="PCA 'small multiples' plot",
fill = "Life stage",
caption=paste0("produced on ", Sys.time())) +
scale_x_discrete(limits = targets$sample, labels = targets$group) +
theme_bw() +
theme(text = element_text(size = 5),
title = element_text(size = 5))+
coord_flip()
output.name <- paste0(x, '_PCAGraphs')
save(p4, p5, p6, p7,
file = file.path(output.path,
output.name))
}
Clustering performed on filtered, normalized and variance-stabilized
abundance data using the “complete” method.
A scree plot is a standard way to view eigenvalues for each PCA. The
plot shows the proportion of variance accounted for by each PC.
Plot of the samples in PCA space. Fill color indicates life
stage.
Clustering performed on filtered, normalized and variance-stabilized
abundance data using the “complete” method.
A scree plot is a standard way to view eigenvalues for each PCA. The
plot shows the proportion of variance accounted for by each PC.
Plot of the samples in PCA space. Fill color indicates life
stage.
Clustering performed on filtered and normalized abundance data using
the “complete” method.
A scree plot is a standard way to view eigenvalues for each PCA. The
plot shows the proportion of variance accounted for by each PC.
Plot of the samples in PCA space. Fill color indicates life
stage.
Clustering performed on filtered and normalized abundance data using
the “complete” method.
A scree plot is a standard way to view eigenvalues for each PCA. The
plot shows the proportion of variance accounted for by each PC.
Plot of the samples in PCA space. Fill color indicates life
stage.
Clustering performed on filtered and normalized abundance data using
the “complete” method.
A scree plot is a standard way to view eigenvalues for each PCA. The
plot shows the proportion of variance accounted for by each PC.
Plot of the samples in PCA space. Fill color indicates life
stage.
knitr::opts_chunk$set(
message = FALSE,
warning = FALSE,
collapse = TRUE
)
if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager")
if (!require("tximport", quietly = TRUE)) BiocManager::install("tximport", ask = FALSE)
if (!require("pacman", quietly = TRUE)) install.packages("pacman")
pacman::p_load("tidyverse","data.table", "magrittr","edgeR","matrixStats","cowplot","ggthemes","gprofiler2","limma","tximport", "knitr", "ggforce", "ggdendro", "factoextra", "gridExtra", "dendextend")
# Check for presence of output folder, generate if it doesn't exist
output.path <- "./Outputs"
if (!dir.exists(output.path)){
dir.create(output.path)
}
app.path <-"../Data"
www.path <-"../www"
species_list <- tibble(species = c('elegans', 'briggsae', 'brenneri', 'remanei', 'japonica'))
for (x in species_list$species) {
# read in the study design ----
targets <- read_tsv(paste0("./Data/", x, "/", x,"_study_design.txt"),
na = c("", "NA", "na"), show_col_types = F)
# load pre-generated annotation information
load(paste0("./Outputs/",x,"_geneAnnotations"))
# import featureCount output into R ----
if (x == 'elegans' | x == 'briggsae') {
path <- paste0("./Data/", x, "/featureCount.C_", x,".", targets$Biological_ID, ".CDS.unique_only.ws290.txt")
} else {
path <- paste0("./Data/", x, "/featureCount.C_", x,".", targets$Biological_ID, ".coding_exon.unique_only.ws290.txt")
}
featureCountData<- rbindlist(lapply(path, fread), idcol="sample") %>%
mutate(sample = targets$sample[sample])
colnames(featureCountData)<-c('sample','geneID', 'geneName', 'stableID', "location", "length", "count")
featureCountData_wider <- featureCountData %>%
dplyr::select(!c(location, length)) %>%
pivot_wider(names_from = sample, values_from = count)
counts <- featureCountData_wider %>%
dplyr::select(-c(stableID, geneName))%>%
column_to_rownames(var = "geneID")
annotations_sub<-dplyr::select(featureCountData_wider, geneID) %>%
left_join(annotations, by = "geneID")
# generate a DGEList
myDGEList <- DGEList(counts,
samples = targets$sample,
group = targets$group,
genes = annotations_sub)
# Save outputs
# Save myDGEList for downstream filtering
output.name <- paste0(x, '_DGEList')
save(myDGEList,
file = file.path(output.path,
output.name))
}
species_list <- tibble(species = c('elegans', 'briggsae', 'brenneri', 'remanei', 'japonica'))
for (x in species_list$species) {
# load pre-generated DGEList information
load(paste0("./Outputs/",x,"_DGEList"))
# load pre-generated annotation information
load(paste0("./Outputs/",x,"_geneAnnotations"))
# read in the study design ----
targets <- read_tsv(paste0("./Data/", x, "/", x,"_study_design.txt"),
na = c("", "NA", "na"), show_col_types = F)
# Generate life stage IDs
ids <- rep(cbind(targets$group),
times = nrow(myDGEList$counts)) %>%
as_factor()
# calculate and plot log2 counts per million ----
# use the 'cpm' function from EdgeR to get log2 counts per million
# then coerce into a tibble
log2.cpm.df.pivot <-cpm(myDGEList, log=TRUE) %>%
as_tibble(rownames = "geneID") %>%
setNames(nm = c("geneID", targets$sample)) %>%
pivot_longer(cols = -geneID,
names_to = "samples",
values_to = "expression") %>%
add_column(life_stage = ids)
# plot the data
p1 <- ggplot(log2.cpm.df.pivot) +
aes(x=samples, y=expression, fill=life_stage) +
geom_violin(trim = FALSE, show.legend = T, alpha= 0.7) +
stat_summary(fun = "median",
geom = "point",
shape = 20,
size = 2,
color = "black",
show.legend = FALSE) +
labs(y="log2 expression", x = "sample",
title = paste0("C. ", x, ": Log2 Counts per Million (CPM)"),
subtitle="unfiltered, non-normalized",
fill= "Life stage",
caption=paste0("produced on ", Sys.time())) +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10),
axis.text.y = element_text (size = 5)) +
coord_flip()
# Filter the data ----
# filter genes/transcripts with low counts
# how many genes had more than 1 CPM (TRUE) in at least n samples
# Note: The cutoff "n" is adjusted for the number of
# samples in the smallest group of comparison.
if (x == 'elegans' | x == 'briggsae') {
keepers <- cpm(myDGEList) %>%
rowSums(.>1)>=2
} else {
keepers <- cpm(myDGEList) %>%
rowSums(.>1)>=1
}
myDGEList.filtered <- myDGEList[keepers,]
ids.filtered <- rep(cbind(targets$group),
times = nrow(myDGEList.filtered)) %>%
as_factor()
log2.cpm.filtered.df.pivot <- cpm(myDGEList.filtered, log=TRUE) %>%
as_tibble(rownames = "geneID") %>%
setNames(nm = c("geneID", targets$sample)) %>%
pivot_longer(cols = -geneID,
names_to = "samples",
values_to = "expression") %>%
add_column(life_stage = ids.filtered)
p2 <- ggplot(log2.cpm.filtered.df.pivot) +
aes(x=samples, y=expression, fill=life_stage) +
geom_violin(trim = FALSE, show.legend = T, alpha= 0.7) +
stat_summary(fun = "median",
geom = "point",
shape = 20,
size = 2,
color = "black",
show.legend = FALSE) +
labs(y="log2 expression", x = "sample",
title = paste0("C. ", x, ": Log2 Counts per Million (CPM)"),
subtitle="filtered, non-normalized",
fill = "Life stage",
caption=paste0("produced on ", Sys.time())) +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10),
axis.text.y = element_text (size = 5)) +
coord_flip()
# Look at the genes excluded by the filtering step ----
# just to check that there aren't any with
# high expression that are in few samples
# Discarded genes
myDGEList.discarded <- myDGEList[!keepers,]
ids.discarded <- rep(cbind(targets$group),
times = nrow(myDGEList.discarded)) %>%
as_factor()
log2.cpm.discarded.df.pivot <- cpm(myDGEList.discarded, log=F) %>%
as_tibble(rownames = "geneID") %>%
setNames(nm = c("geneID", targets$sample)) %>%
pivot_longer(cols = -geneID,
names_to = "samples",
values_to = "expression") %>%
add_column(life_stage = ids.discarded)
# Genes that are above 1 cpm
log2.cpm.discarded.df.pivot %>%
dplyr::filter(expression > 1)
# Generate a matrix of discarded genes and their raw counts ----
discarded.gene.df <- log2.cpm.discarded.df.pivot %>%
pivot_wider(names_from = c(life_stage, samples),
names_sep = "-",
values_from = expression,
id_cols = geneID)%>%
left_join(annotations, by = "geneID")
# Plot discarded genes
p.discarded <- ggplot(log2.cpm.discarded.df.pivot) +
aes(x=samples, y=expression, color=life_stage) +
geom_jitter(alpha = 0.3, show.legend = T)+
stat_summary(fun = "median",
geom = "point",
shape = 20,
size = 2,
color = "black",
show.legend = FALSE) +
labs(y="expression", x = "sample",
title = paste0("C. ", x, ": Counts per Million (CPM)"),
subtitle="genes excluded by low count filtering step, non-normalized",
fill = "Life stage",
caption=paste0("produced on ", Sys.time())) +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10),
axis.text.y = element_text (size = 5)) +
coord_flip()
# Normalize the data using a between samples normalization ----
# Source for TMM sample normalization here:
# https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-3-r25
myDGEList.filtered.norm <- calcNormFactors(myDGEList.filtered, method = "TMM")
log2.cpm.filtered.norm <- cpm(myDGEList.filtered.norm, log=TRUE)
log2.cpm.filtered.norm.df<- cpm(myDGEList.filtered.norm, log=TRUE) %>%
as_tibble(rownames = "geneID") %>%
setNames(nm = c("geneID", targets$sample))
log2.cpm.filtered.norm.df.pivot<-log2.cpm.filtered.norm.df %>%
pivot_longer(cols = -geneID,
names_to = "samples",
values_to = "expression") %>%
add_column(life_stage = ids.filtered)
p3 <- ggplot(log2.cpm.filtered.norm.df.pivot) +
aes(x=samples, y=expression, fill=life_stage) +
geom_violin(trim = FALSE, show.legend = T, alpha = 0.7) +
stat_summary(fun = "median",
geom = "point",
shape = 20,
size = 2,
color = "black",
show.legend = FALSE) +
labs(y="log2 expression", x = "sample",
title = paste0("C. ", x, ": Log2 Counts per Million (CPM)"),
subtitle="filtered, TMM normalized",
fill = "Life stage",
caption=paste0("produced on ", Sys.time())) +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10),
axis.text.y = element_text (size = 5)) +
coord_flip()
# Save outputs
output.name <- paste0(x, '_FilteringNormalizationGraphs')
save(p1, p2, p3, p.discarded, discarded.gene.df,
file = file.path(output.path,
output.name))
output.name <- paste0(x, '_FilteredNormalizedData')
save(myDGEList.filtered.norm, log2.cpm.filtered.norm,
file = file.path(output.path,
output.name))
# Save a matrix of discarded genes and their raw counts ----
output.name <- paste0(x, '_discardedGenes.csv')
discarded.gene.df %>%
write.csv(file = file.path(output.path,
output.name))
}
species_list <- tibble(species = c('elegans', 'briggsae', 'brenneri', 'remanei', 'japonica'))
for (x in species_list$species) {
# load pre-generated filtered and normalized data
load(paste0("./Outputs/",x,"_FilteredNormalizedData"))
# read in the study design ----
targets <- read_tsv(paste0("./Data/", x, "/", x,"_study_design.txt"),
na = c("", "NA", "na"), show_col_types = F)
if (x == 'elegans' | x == 'briggsae'){
# Compute Variance-Stabilized DGEList Object ----
# Set up the design matrix ----
# no intercept/blocking for matrix, comparisons across group
group <- factor(targets$group)
design <- model.matrix(~0 + group)
colnames(design) <- levels(group)
# Model mean-variance trend and fit linear model to data ----
# Use VOOM function from Limma package to model the mean-variance relationship
# produces a variance-stabilized DGEList, that include precision
# weights for each gene to try and control for heteroscedasity.
# transforms count data to log2-counts per million
# Outputs: E = normalized expression values on the log2 scale
# Note: this object is being called 'DGEList.filtered.norm' instead of
# 'v.DGEList.filtered.norm' to make it easier to work with non-variance stabilized data
# from other species.
DGEList.filtered.norm <- voom(counts = myDGEList.filtered.norm,
design = design, plot = T)
colnames(DGEList.filtered.norm)<-targets$sample
colnames(DGEList.filtered.norm$E) <- paste(targets$group,
targets$uniqueness_suffix,sep = '-')
# Save matrix of genes and their filtered, normalized, voom-transformed counts in Shiny app download (www) folder ----
# This is the count data that underlies the differential expression analyses in the Shiny app.
output.name <- paste0(x, '_log2cpm_filtered_norm_voom.csv')
write.csv(DGEList.filtered.norm$E,
file = file.path(www.path,
output.name))
# Save variance-stabilized DGEList in Shiny app Data folder----
output.name <- paste0(x, '_DGEList')
save(DGEList.filtered.norm,
file = file.path(app.path,
output.name))
} else{
DGEList.filtered.norm <- new("EList", list (genes = myDGEList.filtered.norm$genes, targets = myDGEList.filtered.norm$samples, E = log2.cpm.filtered.norm))
colnames(DGEList.filtered.norm)<-targets$sample
colnames(DGEList.filtered.norm$E) <- paste(targets$group,
targets$uniqueness_suffix,sep = '-')
# Save matrix of genes and their filtered, normalized counts in Shiny app download (www) folder----
output.name <- paste0(x, '_log2cpm_filtered_norm.csv')
write.csv(DGEList.filtered.norm$E,
file = file.path(www.path,
output.name))
# Save DGEList in Shiny app Data folder----
output.name <- paste0(x, '_DGEList')
save(DGEList.filtered.norm,
file = file.path(app.path,
output.name))
}
}
species_list <- tibble(species = c('elegans', 'briggsae', 'brenneri', 'remanei', 'japonica'))
for (x in species_list$species) {
# read in DGElist object
output.name <- paste0(x, '_DGEList')
load(file = file.path(app.path, output.name))
# read in the study design ----
targets <- read_tsv(paste0("./Data/", x, "/", x,"_study_design.txt"),
na = c("", "NA", "na"), show_col_types = F)
# Perform multivariate analysis on the data
# Identify variables of interest in study design file ----
group <- factor(targets$group)
# Hierarchical clustering ---------------
# Remember: hierarchical clustering can only work on a data matrix, not a data frame
# Calculate distance matrix
# dist calculates distance between rows, so transpose data so that we get distance between samples.
# how similar are samples from each other
distance <- dist(t(DGEList.filtered.norm$E), method = "maximum") #other distance methods are "euclidean", maximum", "manhattan", "canberra", "binary" or "minkowski"
# Calculate clusters to visualize differences. This is the hierarchical clustering.
# The methods here include: single (i.e. "friends-of-friends"), complete (i.e. complete linkage), and average (i.e. UPGMA). Here's a comparison of different types: https://en.wikipedia.org/wiki/UPGMA#Comparison_with_other_linkages
clusters <- hclust(distance, method = "complete") #other agglomeration methods are "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", or "centroid"
dend <- as.dendrogram(clusters)
p4<-dend %>%
dendextend::set("hang_leaves", c(0.1)) %>%
dendextend::set("labels_cex", c(0.5)) %>%
dendextend::set("branches_lwd", c(0.7)) %>%
as.ggdend %>%
ggplot (offset_labels = -0.5) +
theme_dendro() +
ylim(-8, max(get_branches_heights(dend))) +
labs(title = "Hierarchical Cluster Dendrogram ",
subtitle = "filtered, TMM normalized",
y = "Distance",
x = "Life stage") +
coord_fixed(1/2) +
theme(axis.title.x = element_text(color = "black"),
axis.title.y = element_text(angle = 90),
axis.text.y = element_text(angle = 0),
axis.line.y = element_line(color = "black"),
axis.ticks.y = element_line(color = "black"),
axis.ticks.length.y = unit(2, "mm"))
# Principal component analysis (PCA) -------------
# this also works on a data matrix, not a data frame
pca.res <- prcomp(t(DGEList.filtered.norm$E), scale.=F, retx=T)
summary(pca.res) # Prints variance summary for all principal components.
#pca.res$rotation #$rotation shows you how much each gene influenced each PC (called 'scores')
#pca.res$x # 'x' shows you how much each sample influenced each PC (called 'loadings')
#note that these have a magnitude and a direction (this is the basis for making a PCA plot)
## This generates a screeplot: a standard way to view eigenvalues for each PCA. Shows the proportion of variance accounted for by each PC. Plotting only the first 10 dimensions.
p5<-fviz_eig(pca.res,
#barcolor = brewer.pal(8,"Pastel2")[8],
#barfill = brewer.pal(8,"Pastel2")[8],
linecolor = "black",
main = "Scree plot: proportion of variance accounted for by each principal component", ggtheme = theme_bw())
pc.var<-pca.res$sdev^2 # sdev^2 captures these eigenvalues from the PCA result
pc.per<-round(pc.var/sum(pc.var)*100, 1) # we can then use these eigenvalues to calculate the percentage variance explained by each PC
# Visualize the PCA result ------------------
#lets first plot any two PCs against each other
#We know how much each sample contributes to each PC (loadings), so let's plot
pca.res.df <- as_tibble(pca.res$x[,1:2]) %>%
add_column(group = targets$group)
# Plotting PC1 and PC2
p6<-ggplot(pca.res.df) +
aes(x=PC1, y=PC2,
label = group,
fill = group,
color = group
) +
geom_point(size=4) +
xlab(paste0("PC1 (",pc.per[1],"%",")")) +
ylab(paste0("PC2 (",pc.per[2],"%",")")) +
labs(title="Principal Components Analysis",
sub = "Note: analysis is blind to sample identity.",
fill = "Life stage",
color = "Life stage") +
scale_x_continuous(expand = c(.3, .3)) +
scale_y_continuous(expand = c(.3, .3)) +
coord_fixed() +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10))
# Create a PCA 'small multiples' chart ----
if (ncol(pca.res$x)>5){
pca.res.df <- pca.res$x[,1:6]}
else {
pca.res.df <- pca.res$x[,1:ncol(pca.res$x)]
}
pca.res.df <- pca.res.df %>%
as_tibble() %>%
add_column(sample = targets$sample,
group = targets$group)
pca.pivot <- pivot_longer(pca.res.df, # data frame to be pivoted
cols = PC1:PC3, # column names to be stored as a SINGLE variable
names_to = "PC", # name of that new variable (column)
values_to = "loadings") # name of new variable (column) storing all the values (data)
PC1<-subset(pca.pivot, PC == "PC1")
PC2 <-subset(pca.pivot, PC == "PC2")
PC3 <- subset(pca.pivot, PC == "PC3")
#PC4 <- subset(pca.pivot, PC == "PC4")
# New facet label names for PCs
PC.labs <- c(paste0("PC1 (",pc.per[1],"%",")"),
paste0("PC2 (",pc.per[2],"%",")"),
paste0("PC3 (",pc.per[3],"%",")")
)
names(PC.labs) <- c("PC1", "PC2", "PC3")
p7<-ggplot(pca.pivot) +
aes(x=sample, y=loadings) + # you could iteratively 'paint' different co-variates onto this plot using the 'fill' aes
geom_bar(stat="identity") +
facet_wrap(~PC, labeller = labeller(PC = PC.labs)) +
geom_bar(data = PC1, stat = "identity", aes(fill = group)) +
geom_bar(data = PC2, stat = "identity", aes(fill = group)) +
geom_bar(data = PC3, stat = "identity", aes(fill = group)) +
labs(title="PCA 'small multiples' plot",
fill = "Life stage",
caption=paste0("produced on ", Sys.time())) +
scale_x_discrete(limits = targets$sample, labels = targets$group) +
theme_bw() +
theme(text = element_text(size = 5),
title = element_text(size = 5))+
coord_flip()
output.name <- paste0(x, '_PCAGraphs')
save(p4, p5, p6, p7,
file = file.path(output.path,
output.name))
}
load(paste0("./Outputs/elegans_FilteringNormalizationGraphs"))
p1
p2
p3
p.discarded
DT::datatable(discarded.gene.df,
rownames = FALSE,
escape = FALSE,
options = list(autoWidth = TRUE,
scrollX = TRUE,
scrollY = '300px',
scrollCollapse = TRUE,
searchHighlight = TRUE,
pageLength = 10,
lengthMenu = c("5",
"10",
"25",
"50",
"100"),
initComplete = htmlwidgets::JS(
"function(settings, json) {",
paste0("$(this.api().table().container()).css({'font-size': '", "10pt", "'});"),
"}")
)) %>%
DT::formatRound(columns=c(2:(ncol(discarded.gene.df)-10)),
digits=3)
rm("p1", "p2", "p3", "p.discarded", "discarded.gene.df")
load(paste0("./Outputs/briggsae_FilteringNormalizationGraphs"))
p1
p2
p3
p.discarded
DT::datatable(discarded.gene.df,
rownames = FALSE,
escape = FALSE,
options = list(autoWidth = TRUE,
scrollX = TRUE,
scrollY = '300px',
scrollCollapse = TRUE,
searchHighlight = TRUE,
pageLength = 10,
lengthMenu = c("5",
"10",
"25",
"50",
"100"),
initComplete = htmlwidgets::JS(
"function(settings, json) {",
paste0("$(this.api().table().container()).css({'font-size': '", "10pt", "'});"),
"}")
)) %>%
DT::formatRound(columns=c(2:(ncol(discarded.gene.df)-10)),
digits=3)
rm("p1", "p2", "p3", "p.discarded", "discarded.gene.df")
load(paste0("./Outputs/brenneri_FilteringNormalizationGraphs"))
p1
p2
p3
p.discarded
DT::datatable(discarded.gene.df,
rownames = FALSE,
escape = FALSE,
options = list(autoWidth = TRUE,
scrollX = TRUE,
scrollY = '300px',
scrollCollapse = TRUE,
searchHighlight = TRUE,
pageLength = 10,
lengthMenu = c("5",
"10",
"25",
"50",
"100"),
initComplete = htmlwidgets::JS(
"function(settings, json) {",
paste0("$(this.api().table().container()).css({'font-size': '", "10pt", "'});"),
"}")
)) %>%
DT::formatRound(columns=c(2:(ncol(discarded.gene.df)-10)),
digits=3)
rm("p1", "p2", "p3", "p.discarded", "discarded.gene.df")
load(paste0("./Outputs/remanei_FilteringNormalizationGraphs"))
p1
p2
p3
p.discarded
DT::datatable(discarded.gene.df,
rownames = FALSE,
escape = FALSE,
options = list(autoWidth = TRUE,
scrollX = TRUE,
scrollY = '300px',
scrollCollapse = TRUE,
searchHighlight = TRUE,
pageLength = 10,
lengthMenu = c("5",
"10",
"25",
"50",
"100"),
initComplete = htmlwidgets::JS(
"function(settings, json) {",
paste0("$(this.api().table().container()).css({'font-size': '", "10pt", "'});"),
"}")
)) %>%
DT::formatRound(columns=c(2:(ncol(discarded.gene.df)-10)),
digits=3)
rm("p1", "p2", "p3", "p.discarded", "discarded.gene.df")
load(paste0("./Outputs/japonica_FilteringNormalizationGraphs"))
p1
p2
p3
p.discarded
DT::datatable(discarded.gene.df,
rownames = FALSE,
escape = FALSE,
options = list(autoWidth = TRUE,
scrollX = TRUE,
scrollY = '300px',
scrollCollapse = TRUE,
searchHighlight = TRUE,
pageLength = 10,
lengthMenu = c("5",
"10",
"25",
"50",
"100"),
initComplete = htmlwidgets::JS(
"function(settings, json) {",
paste0("$(this.api().table().container()).css({'font-size': '", "10pt", "'});"),
"}")
)) %>%
DT::formatRound(columns=c(2:(ncol(discarded.gene.df)-10)),
digits=3)
load(paste0("./Outputs/elegans_PCAGraphs"))
p4
p5
p6
p7
rm("p4", "p5", "p6", "p7")
load(paste0("./Outputs/briggsae_PCAGraphs"))
p4
p5
p6
p7
rm("p4", "p5", "p6", "p7")
load(paste0("./Outputs/brenneri_PCAGraphs"))
p4
p5
p6
p7
rm("p4", "p5", "p6", "p7")
load(paste0("./Outputs/remanei_PCAGraphs"))
p4
p5
p6
p7
rm("p4", "p5", "p6", "p7")
load(paste0("./Outputs/japonica_PCAGraphs"))
p4
p5
p6
p7
sessionInfo()
sessionInfo()
## R version 4.3.2 (2023-10-31)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.4
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Los_Angeles
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dendextend_1.17.1 gridExtra_2.3 factoextra_1.0.7
## [4] ggdendro_0.2.0 ggforce_0.4.2 knitr_1.45
## [7] gprofiler2_0.2.3 ggthemes_5.1.0 cowplot_1.1.3
## [10] matrixStats_1.2.0 edgeR_4.0.16 limma_3.58.1
## [13] magrittr_2.0.3 data.table_1.15.2 lubridate_1.9.3
## [16] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
## [19] purrr_1.0.2 readr_2.1.5 tidyr_1.3.1
## [22] tibble_3.2.1 ggplot2_3.5.0 tidyverse_2.0.0
## [25] pacman_0.5.1 tximport_1.30.0 BiocManager_1.30.22
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.0 viridisLite_0.4.2 farver_2.1.1 viridis_0.6.5
## [5] fastmap_1.1.1 lazyeval_0.2.2 tweenr_2.0.3 digest_0.6.34
## [9] timechange_0.3.0 lifecycle_1.0.4 ellipsis_0.3.2 statmod_1.5.0
## [13] compiler_4.3.2 rlang_1.1.3 sass_0.4.8 tools_4.3.2
## [17] utf8_1.2.4 yaml_2.3.8 ggsignif_0.6.4 labeling_0.4.3
## [21] htmlwidgets_1.6.4 bit_4.0.5 abind_1.4-7 withr_3.0.0
## [25] grid_4.3.2 polyclip_1.10-6 fansi_1.0.6 ggpubr_0.6.0
## [29] colorspace_2.1-1 scales_1.3.0 MASS_7.3-60.0.1 cli_3.6.2
## [33] rmarkdown_2.26 crayon_1.5.2 generics_0.1.3 rstudioapi_0.15.0
## [37] httr_1.4.7 tzdb_0.4.0 cachem_1.0.8 parallel_4.3.2
## [41] vctrs_0.6.5 carData_3.0-5 jsonlite_1.8.8 car_3.1-2
## [45] hms_1.1.3 bit64_4.0.5 ggrepel_0.9.5 rstatix_0.7.2
## [49] crosstalk_1.2.1 locfit_1.5-9.9 plotly_4.10.4 jquerylib_0.1.4
## [53] glue_1.7.0 DT_0.32 stringi_1.8.3 gtable_0.3.4
## [57] munsell_0.5.0 pillar_1.9.0 htmltools_0.5.7 R6_2.5.1
## [61] vroom_1.6.5 evaluate_0.23 lattice_0.22-5 highr_0.10
## [65] backports_1.4.1 broom_1.0.5 bslib_0.6.1 Rcpp_1.0.12
## [69] xfun_0.42 pkgconfig_2.0.3